Maternal Health
Doubly robust identification of treatment effects from multiple environments
De Bartolomeis, Piersilvio, Kostin, Julia, Abad, Javier, Wang, Yixin, Yang, Fanny
Treatment effects are key quantities of interest in applied domains such as medicine and social sciences, as they determine the impact of interventions like novel treatments or policies on outcomes of interest. To achieve this goal, researchers often rely on randomized trials since randomizing the treatment assignment guarantees unbiased treatment effect estimates under mild assumptions. However, methods relying on randomized data face several issues, such as small sample sizes, sample populations that do not reflect those seen in the real world, and ethical or financial constraints. As a result, there is growing interest in using observational data to estimate treatment effects. A fundamental challenge in using observational data is the selection of a valid adjustment set, i.e. a set of covariates that can be used to identify and estimate the treatment effect. Although criteria for identifying valid adjustment sets are well-established, they rely on the knowledge of the underlying causal graph. When the graph is not known, practitioners often adjust for all available covariates [5]. Yet, this approach runs the risk of including bad controls--covariates that open backdoor paths between the treatment (T) and the outcome (Y), thereby introducing bias into the treatment effect estimate.
Desert Camels and Oil Sheikhs: Arab-Centric Red Teaming of Frontier LLMs
Saeed, Muhammed, Mohamed, Elgizouli, Mohamed, Mukhtar, Raza, Shaina, Abdul-Mageed, Muhammad, Shehata, Shady
Large language models (LLMs) are widely used but raise ethical concerns due to embedded social biases. This study examines LLM biases against Arabs versus Westerners across eight domains, including women's rights, terrorism, and anti-Semitism and assesses model resistance to perpetuating these biases. To this end, we create two datasets: one to evaluate LLM bias toward Arabs versus Westerners and another to test model safety against prompts that exaggerate negative traits ("jailbreaks"). We evaluate six LLMs -- GPT-4, GPT-4o, LlaMA 3.1 (8B & 405B), Mistral 7B, and Claude 3.5 Sonnet. We find 79% of cases displaying negative biases toward Arabs, with LlaMA 3.1-405B being the most biased. Our jailbreak tests reveal GPT-4o as the most vulnerable, despite being an optimized version, followed by LlaMA 3.1-8B and Mistral 7B. All LLMs except Claude exhibit attack success rates above 87% in three categories. We also find Claude 3.5 Sonnet the safest, but it still displays biases in seven of eight categories. Despite being an optimized version of GPT4, We find GPT-4o to be more prone to biases and jailbreaks, suggesting optimization flaws. Our findings underscore the pressing need for more robust bias mitigation strategies and strengthened security measures in LLMs.
Bayesian Collaborative Bandits with Thompson Sampling for Improved Outreach in Maternal Health Program
Dasgupta, Arpan, Jain, Gagan, Suggala, Arun, Shanmugam, Karthikeyan, Tambe, Milind, Taneja, Aparna
Mobile health (mHealth) programs face a critical challenge in optimizing the timing of automated health information calls to beneficiaries. This challenge has been formulated as a collaborative multi-armed bandit problem, requiring online learning of a low-rank reward matrix. Existing solutions often rely on heuristic combinations of offline matrix completion and exploration strategies. In this work, we propose a principled Bayesian approach using Thompson Sampling for this collaborative bandit problem. Our method leverages prior information through efficient Gibbs sampling for posterior inference over the low-rank matrix factors, enabling faster convergence. We demonstrate significant improvements over state-of-the-art baselines on a real-world dataset from the world's largest maternal mHealth program. Our approach achieves a $16\%$ reduction in the number of calls compared to existing methods and a $47$\% reduction compared to the deployed random policy. This efficiency gain translates to a potential increase in program capacity by $0.5-1.4$ million beneficiaries, granting them access to vital ante-natal and post-natal care information. Furthermore, we observe a $7\%$ and $29\%$ improvement in beneficiary retention (an extremely hard metric to impact) compared to state-of-the-art and deployed baselines, respectively. Synthetic simulations further demonstrate the superiority of our approach, particularly in low-data regimes and in effectively utilizing prior information. We also provide a theoretical analysis of our algorithm in a special setting using Eluder dimension.
Optimizing Vital Sign Monitoring in Resource-Constrained Maternal Care: An RL-Based Restless Bandit Approach
Boehmer, Niclas, Zhao, Yunfan, Xiong, Guojun, Rodriguez-Diaz, Paula, Cibrian, Paola Del Cueto, Ngonzi, Joseph, Boatin, Adeline, Tambe, Milind
Maternal mortality remains a significant global public health challenge. One promising approach to reducing maternal deaths occurring during facility-based childbirth is through early warning systems, which require the consistent monitoring of mothers' vital signs after giving birth. Wireless vital sign monitoring devices offer a labor-efficient solution for continuous monitoring, but their scarcity raises the critical question of how to allocate them most effectively. We devise an allocation algorithm for this problem by modeling it as a variant of the popular Restless Multi-Armed Bandit (RMAB) paradigm. In doing so, we identify and address novel, previously unstudied constraints unique to this domain, which render previous approaches for RMABs unsuitable and significantly increase the complexity of the learning and planning problem. To overcome these challenges, we adopt the popular Proximal Policy Optimization (PPO) algorithm from reinforcement learning to learn an allocation policy by training a policy and value function network. We demonstrate in simulations that our approach outperforms the best heuristic baseline by up to a factor of $4$.
I-SIRch: AI-Powered Concept Annotation Tool For Equitable Extraction And Analysis Of Safety Insights From Maternity Investigations
Singh, Mohit Kumar, Cosma, Georgina, Waterson, Patrick, Back, Jonathan, Jun, Gyuchan Thomas
Maternity care is a complex system involving treatments and interactions between patients, providers, and the care environment. To improve patient safety and outcomes, understanding the human factors (e.g. individuals decisions, local facilities) influencing healthcare delivery is crucial. However, most current tools for analysing healthcare data focus only on biomedical concepts (e.g. health conditions, procedures and tests), overlooking the importance of human factors. We developed a new approach called I-SIRch, using artificial intelligence to automatically identify and label human factors concepts in maternity healthcare investigation reports describing adverse maternity incidents produced by England's Healthcare Safety Investigation Branch (HSIB). These incident investigation reports aim to identify opportunities for learning and improving maternal safety across the entire healthcare system. I-SIRch was trained using real data and tested on both real and simulated data to evaluate its performance in identifying human factors concepts. When applied to real reports, the model achieved a high level of accuracy, correctly identifying relevant concepts in 90\% of the sentences from 97 reports. Applying I-SIRch to analyse these reports revealed that certain human factors disproportionately affected mothers from different ethnic groups. Our work demonstrates the potential of using automated tools to identify human factors concepts in maternity incident investigation reports, rather than focusing solely on biomedical concepts. This approach opens up new possibilities for understanding the complex interplay between social, technical, and organisational factors influencing maternal safety and population health outcomes. By taking a more comprehensive view of maternal healthcare delivery, we can develop targeted interventions to address disparities and improve maternal outcomes.
The OxMat dataset: a multimodal resource for the development of AI-driven technologies in maternal and newborn child health
Khan, M. Jaleed, Duta, Ioana, Albert, Beth, Cooke, William, Vatish, Manu, Jones, Gabriel Davis
The rapid advancement of Artificial Intelligence (AI) in healthcare presents a unique opportunity for advancements in obstetric care, particularly through the analysis of cardiotocography (CTG) for fetal monitoring. However, the effectiveness of such technologies depends upon the availability of large, high-quality datasets that are suitable for machine learning. This paper introduces the Oxford Maternity (OxMat) dataset, the world's largest curated dataset of CTGs, featuring raw time series CTG data and extensive clinical data for both mothers and babies, which is ideally placed for machine learning. The OxMat dataset addresses the critical gap in women's health data by providing over 177,211 unique CTG recordings from 51,036 pregnancies, carefully curated and reviewed since 1991. The dataset also comprises over 200 antepartum, intrapartum and postpartum clinical variables, ensuring near-complete data for crucial outcomes such as stillbirth and acidaemia. While this dataset also covers the intrapartum stage, around 94% of the constituent CTGS are antepartum. This allows for a unique focus on the underserved antepartum period, in which early detection of at-risk fetuses can significantly improve health outcomes. Our comprehensive review of existing datasets reveals the limitations of current datasets: primarily, their lack of sufficient volume, detailed clinical data and antepartum data. The OxMat dataset lays a foundation for future AI-driven prenatal care, offering a robust resource for developing and testing algorithms aimed at improving maternal and fetal health outcomes.
NLP for Maternal Healthcare: Perspectives and Guiding Principles in the Age of LLMs
Antoniak, Maria, Naik, Aakanksha, Alvarado, Carla S., Wang, Lucy Lu, Chen, Irene Y.
Ethical frameworks for the use of natural language processing (NLP) are urgently needed to shape how large language models (LLMs) and similar tools are used for healthcare applications. Healthcare faces existing challenges including the balance of power in clinician-patient relationships, systemic health disparities, historical injustices, and economic constraints. Drawing directly from the voices of those most affected, and focusing on a case study of a specific healthcare setting, we propose a set of guiding principles for the use of NLP in maternal healthcare. We led an interactive session centered on an LLM-based chatbot demonstration during a full-day workshop with 39 participants, and additionally surveyed 30 healthcare workers and 30 birthing people about their values, needs, and perceptions of NLP tools in the context of maternal health. We conducted quantitative and qualitative analyses of the survey results and interactive discussions to consolidate our findings into a set of guiding principles. We propose nine principles for ethical use of NLP for maternal healthcare, grouped into three themes: (i) recognizing contextual significance (ii) holistic measurements, and (iii) who/what is valued. For each principle, we describe its underlying rationale and provide practical advice. This set of principles can provide a methodological pattern for other researchers and serve as a resource to practitioners working on maternal health and other healthcare fields to emphasize the importance of technical nuance, historical context, and inclusive design when developing NLP technologies for clinical use.
Estimating Countries with Similar Maternal Mortality Rate using Cluster Analysis and Pairing Countries with Identical MMR
Nandini, S., R, Sanjjushri Varshini
In the evolving world, we require more additionally the young era to flourish and evolve into developed land. Most of the population all around the world are unaware of the complications involved in the routine they follow while they are pregnant and how hospital facilities affect maternal health. Maternal Mortality is the death of a pregnant woman due to intricacies correlated to pregnancy, underlying circumstances exacerbated by the pregnancy or management of these situations. It is crucial to consider the Maternal Mortality Rate (MMR) in diverse locations and determine which human routines and hospital facilities diminish the Maternal Mortality Rate (MMR). This research aims to examine and discover the countries which are keeping more lavish threats of MMR and countries alike in MMR encountered. Data is examined and collected for various countries, data consists of the earlier years' observation. From the perspective of Machine Learning, Unsupervised Machine Learning is implemented to perform Cluster Analysis. Therefore the pairs of countries with similar MMR as well as the extreme opposite pair concerning the MMR are found.
GE is working on AI-powered ultrasounds to combat pediatric and maternal mortality rates
GE Health says it plans to develop an AI-assisted ultrasound imaging tool that is so easy to use, that even healthcare providers without specialized training will be able to operate it. The device's research and development will be funded by a $44 million grant from the Bill & Melinda Gates Foundation, which has historically invested in the roll-out of new technologies in resource-poor settings to address gaps in healthcare access. GE says the AI-powered imaging technology has been designed to be dispersed to low-and-middle income countries where the services of healthcare providers may be stretched thin. The ultrasound tool will be more effective at providing clear readings of lung and ultrasound scans across maternal and fetal care as well as pediatric lung health. These areas of medicine are particularly notable because maternal and child mortality is mostly preventable if medical intervention occurs early.
Closing the Gap in High-Risk Pregnancy Care Using Machine Learning and Human-AI Collaboration
Mozannar, Hussein, Utsumi, Yuria, Chen, Irene Y., Gervasi, Stephanie S., Ewing, Michele, Smith-McLallen, Aaron, Sontag, David
High-risk pregnancy (HRP) is a pregnancy complicated by factors that can adversely affect outcomes of the mother or the infant. Health insurers use algorithms to identify members who would benefit from additional clinical support. We aimed to build machine learning algorithms to identify pregnant patients and triage them by risk of complication to assist care management. In this retrospective study, we trained a hybrid Lasso regularized classifier to predict whether a patient is currently pregnant using claims data from 36735 insured members of Independence Blue Cross (IBC), a health insurer in Philadelphia. We then train a linear classifier on a subset of 12,243 members to predict whether a patient will develop gestational diabetes or gestational hypertension. These algorithms were developed in cooperation with the care management team at IBC and integrated into the dashboard. In small user studies with the nurses, we evaluated the impact of integrating our algorithms into their workflow. We find that the proposed model predicts an earlier pregnancy start date for 3.54% (95% CI 3.05-4.00) for patients with complications compared to only using a set of pre-defined codes that indicate the start of pregnancy and never later at the expense of a 5.58% (95% CI 4.05-6.40) false positive rate. The classifier for predicting complications has an AUC of 0.754 (95% CI 0.764-0.788) using data up to the patient's first trimester. Nurses from the care management program expressed a preference for the proposed models over existing approaches. The proposed model outperformed commonly used claim codes for the identification of pregnant patients at the expense of a manageable false positive rate. Our risk complication classifier shows that we can accurately triage patients by risk of complication.